MATRIX-FREE APPROXIMATE EQUILIBRATION 

ANDREW M. BRADLEY**§ AND WALTER MURRAY^ t 

Abstract. The condition number of a diagonally scaled matrix, for appropriately chosen scaling 
matrices, is often less than that of the original. Equilibration scales a matrix so that the scaled 
matrix's row and column norms are equal. Scaling can be approximate. We develop approximate 
_L ^ ■ equilibration algorithms for nonsymmetric and symmetric matrices having signed elements that ac- 

cess a matrix only by matrix- vector products. 
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1. Introduction. For a square, nonnegative, real, nonsymmetric matrix B, 
equilibration in the 1-norm finds x,y > such that XBy = e and YB T x = e, where 
X = diag(x) and similarly for other vectors, and e is the vector of all ones. Hence 
XBY is doubly stochastic. For a symmetric matrix, symmetric equilibration finds 
x > such that XBx = e. If B = A o A for A a real, possibly signed, matrix, where o 
denotes the element-wise product, then these equations equilibrate A in the 2-norm. 
Equilibration in the 2-norm is often called binormalization. Approximate equilibra- 
tion scales a matrix so that its row and column norms are almost equal. Both the 
exactly and approximately equilibrated matrices often have smaller condition num- 
bers than the original. In this paper we always use the 2-norm condition number. 
Equilibration is particularly usefully applied to matrices for which simpler diagonal 
£SJ ■ scaling methods fail: for example, to indefinite symmetric matrices. In Section [21 we 

^ \ compare equilibration with Jacobi scaling when applied to symmetric matrices. 

In some problems, accessing elements of a matrix is expensive. What are often 
called matrix-free algorithms access a matrix only by matrix-vector products. If A is 
■ a matrix having nonnegative elements, then many algorithms already exist to equili- 

brate A using only matrix-vector products: for example, the Sinkhorn-Knopp itera- 
tion. But if A has signed elements, then one must obtain \ A\ to use these algorithms, 
which requires accessing the elements of A. In Section [3J we develop matrix- free ap- 
proximate equilibration algorithms for square nonsymmetric and symmetric matrices 
having signed elements, and we report the results of numerical experiments with these 
algorithms in Section 0] 

1_h " 2. Diagonal scaling of symmetric matrices. Jacobi scaling pre- and post- 
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multiplies a square, usually symmetric positive definite (spd) matrix by a diagonal 
matrix such that the scaled matrix has unit diagonal elements. 

Numerical experiments show that the condition number of the equilibrated or 
Jacobi-scaled matrix is often considerably less than that of the original matrix. Figure 
12.11 shows the results of a numerical experiment using 323 symmetric matrices from 
the University of Florida Sparse Matrix Collection [7]; see Section [4] for further details 
on the data set. The matrices used in this experiment have sizes 10 to 36441, with 
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Fig. 2.1. Numerical study of conditioning of symmetric matrices. The four plots show condition 
number of the Jacobi-scaled (top) and binormalized (bottom) symmetric positive definite (left) and 
indefinite (right) matrix as a function of the condition number of the unsealed matrix. 



a median size of 5000. Figure [2~T1 shows the condition number of the scaled matrix 
as a function of that of the unsealed matrix. Two diagonal scaling methods are used: 
Jacobi (top) and binormalization (bottom). Matrices are divided into positive definite 
(left) and indefinite (right). 

This experiment shows that if a matrix is spd, then equilibration and Jacobi 
scaling reduce the condition number by about the same amount; indeed, the two 
corresponding plots are almost identical. It also shows that when the two methods are 
applied to an indefinite matrix — in the case of Jacobi scaling, replacing a zero diagonal 
element with a one — the condition number of the Jacobi-scaled matrix is likely to 
be substantially greater than that of the equilibrated matrix. For these reasons, 
equilibration of symmetric indefinite matrices can be thought of as a generalization 
of Jacobi scaling of spd matrices, raising the question of the relationship between the 
two scaling methods when applied to spd matrices. 

Let A be an n x n spd matrix whose diagonal elements are all one. Let k(-) 
denote the 2-norm condition number of a matrix. Van der Sluis showed that k(A) < 
n min^ n(DAD) (Theorem 4.1 of [21]) and that if A has at most q nonzero elements in 
any row, then n(A) < qmin^ k(DAD) (Theorem 4.3 of [H]). A matrix C has Young's 
property A if there exists a permutation matrix P such that 

pcpT =& S) 



and D\ and D2 are square diagonal matrices. Forsthye and Straus showed that if the 
matrix A has in addition Young's property A, then k(A) = min^ k(DAD) (Theorem 
4 of [9]). In summary, these three theorems state that Jacobi scaling is within a factor 
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of n, q, or 1 of optimal among all diagonal scaling matrices. 

If A is spd, then so is B = A o A by the Schur Product Theorem (see. for 
example, Theorem 7.5.3 of |10j). Suppose A has unit diagonal elements. Then so 
docs B. Moreover, By < 1 for i ^ j. Suppose Jacobi scaling — replacing a zero 
diagonal element with a one — has been applied to an n x n symmetric matrix A to 
yield the matrix A, and again let B = Ao A. Consider the vector of row sums ,s = Be. 
If A is indefinite, < Si < oo. If A is spd, as every diagonal element of B is 1, Si > 1; 
and as every off-diagonal element By < 1, Sj < n. 

Let n(v) be the mean of the elements of an n-vector v and var(w) the variance: 
var(w) = n^ 1 — mC 11 )) 2 - If a matrix is binormalized, then the variance of the 

vector of its row 2-norms is 0. If A is indefinite, var(s) can be arbitrarily large. But 
if A is spd, then var(s) < (n — l) 2 . For as each 1 < Si < n, (si — /i(s)) 2 < {n — l) 2 , 
and so n -1 ^(s* ~ m( s )) 2 < _ !) 2 = ( n _ I) 2 - 

From the other direction, an immediate corollary of inequality 2 in |15j is that 
if an spd matrix A is equilibrated in the 2-norm to form A, then n~ x / 2 < An < 1 
(the upper bound follows immediately from equilibration to unit row and column 
1 -norms); if A is indefinite, then of course — 1 < An < 1. 

In summary, if a matrix is spd, Jacobi scaling produces a matrix that is not 
arbitrarily far from being binormalized, and binormalization produces a matrix whose 
diagonal elements are bounded below and above by positive numbers. The bounds 
depend on the size of the matrix. If a matrix is symmetric indefinite, then neither 
statement holds. 

3. Algorithms. Sinkhorn and Knopp analyzed the convergence properties of 
the iteration (|3.1j) : 

r k+1 = (Bc k )-\ c k+l = {B T r k+1 )-\ (3.1) 

The reciprocal is applied by element. c° is a vector whose elements are all positive. 
According to Knight [12], the iteration was used as early as the 1930s. 

Parlett and Landis [T7] generalized Sinkhorn and Knopp's convergence analysis 
and developed several new algorithms, one of which, EQ, substantially outperformed 
the Sinkhorn-Knopp iteration on a test set. Khachiyan and Kalantari |llj used New- 
ton's method to scale positive semidefinite symmetric matrices. Livne and Golub 
|16j developed algorithms for symmetric and nonsymmetric matrices based on the 
Gauss-Seidel-Newton method. Knight and Ruiz [13] devised an algorithm based on 
an inexact Newton method that uses the conjugate gradients iteration. 

Nonuniqueness of equilibration in the infinity norm motivates multiple algorithms 
that consider both efficiency and quality of the scaling under criteria other than the 
infinity norms of the rows and columns. A matrix can be scaled in the infinity norm 
if it has no zero rows or columns. The simplest nonsymmetric algorithm is first to 
scale the rows (columns), then to scale the columns (rows). After the first scaling, the 
largest number in the matrix is 1, and the second scaling cannot produce numbers 
that are larger than 1. Therefore, scaling is achieved after one iteration. Bunch 
[3] developed an algorithm that equilibrates any symmetric matrix in the infinity 
norm. More recently, Ruiz |19| developed another iteration that compares favorably 
with Bunch's algorithm. He extended the method to 1- and 2-norms and showed 
convergence results for these algorithms as strong as, and based on, those by Parlett 
and Landis |17] for their algorithms. 

Each of these algorithms is iterative and yields a sequence of matrices converging 
to a doubly stochastic matrix. A user can terminate the iteration early to yield an 
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approximately equilibrated matrix; hence these algorithms may be viewed as approx- 
imate equilibration algorithms. 

To date, it appears that all scaling algorithms for matrices having signed elements 
require access to the elements of the matrix. If A is nonnegative, the situation is much 
different; for example, the Sinkhorn-Knopp algorithm requires only the matrix- vector 
products (mvp) Ax and A T x. For general matrices, algorithms need at least mvp of 
the form \A\x (1-norm), {AoA)x (2-norm), or similar expressions, and their transposes. 
We introduce approximate scaling algorithms for equilibration in the 2-norm that 
require only the mvp Ax and A T x, where x is a random vector. Algorithms that 
compute the mvp with a random vector have been developed to solve other problems. 
For example, Bekas, Kokiopoulou, and Saad [T] developed a method to estimate the 
diagonal elements of a matrix; and Chen and Dcmmcl [5] , to balance a matrix prior 
to computing its eigenvalues. Our algorithms also have a connection to the methods 
of stochastic approximation |14j . 

We want to emphasize that because the algorithms we propose access a matrix 
having signed elements only through a sequence of mvp, we cannot expect them to 
be faster than, or even as fast as, algorithms that access the elements directly when 
applied to matrices for which direct access to the elements is possible and efficient. 
Our algorithms are useful only if a matrix has signed elements that are impossible or 
inefficient to access directly; it appears there are not algorithms already available to 
solve this problem. It is also desirable that only a small number, relative to the size 
of the matrix, of mvp are required. 

3.1. Existence and uniqueness. A matrix has support if a positive main diag- 
onal exists under a column permutation; a matrix having this property is equivalcntly 
said to be structurally nonsingular [8]. A square matrix has total support if every 
nonzero element occurs in the positive main diagonal under a column permutation. 
A matrix has total support if and only if there exists a doubly stochastic matrix 
having the same zero pattern |18j . A matrix A is partly decomposable if there exist 
permutation matrices P and Q such that 



where E and D are square matrices. A square matrix is fully indecomposable if it 
is not partly decomposable. A fully indecomposable matrix has total support [2]. A 
matrix A is reducible if there exists a permutation matrix P such that PAP T has 
the matrix structure in (|3.2[) ; otherwise, A is irreducible. For convenience, a matrix 
is said to be scalable if it can be equilibrated. 

Theorem 3.1 (Sinkhorn and Knopp |20j ) . Let B be a nonnegative square ma- 
trix. 

1. There exist positive diagonal matrices R and C such that F = RBC is doubly 
stochastic — briefly, B is scalable — if and only if B has total support. 

2. If B is scalable, then F is unique. 

3. R and C are unique up to a scalar multiple if and only if B is fully indecom- 
posable. 

4. The Sinkhorn-Knopp iteration yields a sequence of matrices that converges to 
a unique doubly stochastic matrix, for all initial r, c > 0, if and only if B has support. 
If B has support that is not total, then R and C have elements that diverge. 

Parts 1-3 were independently discovered in [3J. 

Theorem 3.2 (Csima and Datta [B]). A symmetric matrix is symmetrically 




(3.2) 
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scalable if and only if it has total support. 

The necessary and sufficient condition of total support in Theorem l3.2l is identical 
to that in part f of Theorem 13. f I The necessary part follows directly from part 1, but 
proving the sufficiency part requires several steps not needed in the nonsymmctric 
case. 

Section 3 of [T^] discusses the symmetric iteration 

x k+1 = (Bx k )" 1 (3.3) 

for symmetric B and sketches a proof of convergence. Not directly addressed is that 
the iterates x k can oscillate and reducible B. 

If B is irreducible, this oscillation is straightforward and benign. The resulting 
scaled matrix is a scalar multiple of a doubly stochastic one. For example, suppose 
B = 1 and x° = 2. Then for k even, x k = 2, and for k odd, x k = 1/2. In general, if 
symmetric B is irreducible, X k BX k+1 converges to a doubly stochastic matrix, while 
X 2k BX 2k and x 2k+1 BX 2k+1 converge to scalar multiples of a doubly stochastic 
matrix, and these scalars are reciprocals of each other. 

Somewhat more complicated is reducible B. For example, consider the matrix 
B = diag(l 2) T . If a; = e, the even iterates remain e while the odd iterates are v = 
(1 1/2) T . IBV is doubly stochastic, but v is not proportional to e. Moreover, VBV is 
not simply a scalar multiple of a doubly stochastic matrix. This nonconvergence is also 
benign. A reducible symmetric matrix B can be symmetrically permuted to be block 
diagonal with each block irreducible. Hence the equilibration problem is decoupled 
into as many smaller problems. We can construct a symmetric equilibrating vector x 
from the nonsymmetric equilibrating vectors r and c by setting x = \frc. For suppose 
r and c equilibrate B by RBC . Let T be the indices corresponding to an irreducible 
block. Then r(I) oc c(I) and the block X(1,I)B(I,1)X(1,X) is doubly stochastic. 
For B, the symmetric equilibration vector is sfev = (1 l/V2) T . 

These observations suggest that we should write the symmetric Sinkhorn-Knopp 
iteration as 

y k+1 = {By k )-\ z fe+1 = A /^V. (3.4) 

Since x k does not actually play a role in the iteration, in practice, the square root 
operation needs to be applied only after the final iteration to yield the scaling matrix. 

3.2. Stochastic equilibration. Our algorithms are based on the Sinkhorn- 
Knopp iteration. The Sinkhorn-Knopp iteration performs the mvp Bx and B T x for a 
nonnegative matrix B. If A is a matrix having signed elements, then Bij = \ Aij\ p for 
p > 1 for equilibration in the p-norm, and so B is not available if one does not have 
access to the elements of A. The key idea, similar to that in [5], in our algorithms is to 
compute Bx approximately by using an mvp with A rather than B, where B = Ao A, 
and similarly for B T x. 

Let a £ M. n . If the elements of the random vector u £ M. n have zero mean, positive 
and finite variance, and are iid, then E {a T u) 2 = 77E a T a for finite 77 > 0, where E 
denotes expectation. For as Emuj = if i 7^ j, E a j u j) 2 = E J^j a | u j = a % 
where r\ — Eu 2 > is finite. See [5] for more on this and related expectations. We 
use this fact to approximate Bx by computing the mvp AX x ^ 2 u: 



E (AX^ 2 u) 2 = ^{{AX 1 ' 2 ) o (AX 1 / 2 ))e = r){A o A)Xe = v Bx. (3.5) 
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To increase the accuracy of the approximation to Bx, one could compute the mean 
of mult iple mvp AX l / 2 u. Then one could construct an approximate scaling algorithm 
by replacing the exact computation Bx with this estimate, and similarly for B T x, in 
the Sinkhorn-Knopp algorithm. However, the method of stochastic approximation 
[14] suggests a better approach. In stochastic approximation, the exact iteration 
x k+1 = x k + oj k f(x k ) is replaced by the stochastic iteration x k+1 = x k + u> k f(x k ), 
where E/(x fe ) = f(x k ) and x is sought such that f(x) = 0. Rather than explicitly 
average multiple realizations of f(x k ) at each iteration, the stochastic approximation 
iteration controls the relative weight of / through uj k and alters the iterate x at each 
evaluation of /. 

Let p = r^ 1 , 7 = c _1 , and < u) k < 1. Consider the iteration 

P fc+1 = (l-- fe )^ F +- fe ^ F (3-6) 
\\p*\\i ||-Bc fc ||i 

k QT r k+l 



l k+1 = (l-^ fc )^— +uj k 



jk^ USTYfc+l^- 

This iteration takes a convex combination of the reciprocal of an iterate and the 
Sinkhorn-Knopp update when each is normalized by its 1-norm. Let u k and v k be 
random vectors as before. For the vector x, (x) 2 is the element-wise square. Substi- 
tuting (|3.5|) into this iteration, we obtain the stochastic iteration 

y k = (A(C k ) 1/2 u k ) 2 

Hp Hi Wv h 

z k = (A T {R k+1 ) 1/2 v k ) 2 

-.k „k 

<M k+1 — n , ' k \ 7 4- , - k 

Tl Hi Ik Ik 

We implement this iteration in the Matlab function snbin. 

function [r c] = snbin (A, nmv ,m, n) 

% Stochastic matrix— free binormalization for nonsymmetric real A. 
% A is a matrix or function handle. If it is a function handle, 
% then v = A(x) returns A*x and v = A(x, 'trans ') returns A'*x. 

% nmv is the number of forward and transpose matrix — vector 
% product pairs to perform . 

% m,n is the size of the matrix. It is necessary to specify 
% these only if A is a function handle . 

% diag(r) A diag(c) is approximately binormalized (to a scalar). 
op = isa (A, 'function_handle '); 
if ("op) [m n] = size (A); end 
r = ones (m, 1 ) ; c = ones(n,l); 
for ( k = 1 : nmv) 
% omega "k 

alpha = (k — l)/nmv; 

omega = (1 — alpha)*l/2 + alpha*l/nmv; 
% rows 

s = randn (n , 1 ) . / sqrt ( c ) ; 

if(op) y=A(s); else y= A*s; end 

r = (1 — omega )* r /sum( r ) + omega *y . * 2 /sum( y . * 2 ) ; 

% columns 

s = randn (m, 1 ) . / sqrt ( r ) ; 

if(op) y=A(s, 'trans'); else y= ( s ' * A) ' ; end 
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c = (1 — omega )* c /sum( c ) + omega *y . * 2 /sum( y . * 2 ) ; 
end 

r = l./sqrt(r); c = l./sqrt(c); 

Our choice of the sequence u> k is based on numerical experiments; the sequence en- 
courages large changes in c?/||d||i when k is small and smaller changes when k is 
large. 

Iteration (|3.6p forms a linear combination of p k and Bc k . One might consider 
instead forming a linear combination of r k and (Bc k )~ 1 . In the iteration we use, 
a reciprocal is taken after forming a linear combination of the iterate and a random 
quantity; in contrast, in this alternative, it is taken before, and of the random quantity. 
Consequently, the stochastic iteration corresponding to this alternative iteration is less 
stable than ([577]) . 

A straightforward algorithm for the symmetric problem applies snbin to the 
symmetric matrix B and then returns \Jrc. But numerical experiments suggest we 
can do better. For irreducible matrices, the denominators ||rf fc ||i and ||.Ba; ||i in the 
iteration 



d k+1 = {l-uj k )- 



d k k Bx k 

\d k \a +U \\Bx k \\x 

remove the benign oscillation we observed in Section [3. II therefore, adjacent iterates, 
rather than every other one as in (|3.6[) . can be combined in a convex sum. This second 
approach speeds convergence. But it is not sufficient when applied to reducible ma- 
trices. Numerical experiments support using the second approach for early iterations, 
when making progress quickly is important, and then switching to the first approach 
to refine the scaling matrix. We implement this strategy in ssbin. 

function x = s s b i n ( A, nmv , n ) 

% Stochastic matrix— fr ee binormalization for symmetric real A. 

% A is a symmetric real matrix or function handle. If it is a 

% function handle, then v = A(x) returns A*x. 

% nmv is the number of matrix — vector products to perform . 

% [n] is the size of the matrix. It is necessary to specify n 

% only if A is a function handle . 

% diag (x) A diag(x) is approximately binormalized (to a scalar). 
op = isa(A, 'function_handle'); 
if(~op) n = size(A,l); end 
d = ones (n , 1 ) ; dp = d; 
for ( k = 1 : nmv) 

% Approximate matrix — vector product 

u = randn ( n , 1 ) ; 

S = U. / sqrt ( dp ) ; 

if(op) y=A(s); else y= A*s; end 
% omega ~k 

alpha = (k — l)/nmv; 

omega = (1 — alpha)*l/2 + alpha*l/nmv; 
% Iteration 

d = (1 — omega )* d/sum( d ) + omega*y . * 2 /sum( y . * 2 ) ; 
if ( k < min (32,floor( nmv / 2 ) ) ) % Ignore r e du cib ility . 
dp = d; 

else % This block makes ssbin behave like snbin. 

tmp = dp ; dp = d ; d = tmp ; % Swap dp and d . 

end 
end 

x = 1 . / ( d . * dp ) . * ( 1 / 4 ) ; % In case B is reducible 

The final line implements the square root in (|3.4[) . 
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Fig. 4.1. Convergence histories for two symmetric problems having sizes 3564 fl e ft) an d 226340 
(right). Ten solid lines show individual realizations of the algorithm. The dashed line corresponds 
to snbin. The dotted line corresponds to not switching to snbin-Kfce behavior. 
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Fig. 4.2. Ratio for the original and scaled nonsymmetric matrix vs. matrix size N, after the 
indicated number of iterations, for 741 matrices. 



In most iterative algorithms, a measure of the merit of an iterate that requires 
little work to evaluate relative to the work in an iteration influences the behavior 
of the algorithm. In our algorithms, any procedure to assess the merit of an iterate 
would require additional mvp, likely wasting work. Hence the parameter values in the 
loop of each algorithm are fixed independent of problem. 

4. Numerical experiments. In our numerical experiments, two quantities of 
the scaled matrices are measured: condition number if the matrix is not too large; 
and the ratio of the largest to smallest row 2-norms (in the nonsymmetric case, row 
or column, depending on which gives a larger number), hereafter designated as the 
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original matrix for 519 matrices (matrices having N < 2 X lO 4 ^. 
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Fig. 4.4. Ratios for 466 symmetric matrices. Results for only K = 128 are shown; trends in 
K follow those for the nonsymmetric problems. 



ratio. 

We test snbin and ssbin in Matlab on matrices in the University of Florida 
Sparse Matrix Collection [7]; these are obtained by the following queries: 

index = UFgct ( ' refresh ' ) ; 

% Symmetric 

sids = find (~ index . isBinary & index . numerical_sy mmetry = 1 

index . sprank = index. nrows & index . isReal ) ; 
% Square nonsymmetric 

nids = find ( ~ index . isBinary & index . numerical_sy mmetry < 1 
index. nrows = index, ncols 

index, sprank = index, nrows &; index . isReal ) ; 
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Fig. 4.5. Condition numbers for 221 symmetric matrices (matrices having N < 2 X lO 4 ^. 
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Fig. 4.6. ^4 closer look at the ratio as a function of N for K = 128 iterations for nonsymmetric 
(left) and symmetric (right) matrices. 



First we investigate the behavior of ssbin on two problems. Figure I4TT1 shows the 
convergence history, starting with the unaltered matrix, for 12 runs of ssbin on two 
symmetric problems. The smaller has size 3564; the larger has size 226340. log 10 ratio 
is used to measure convergence. The ten closely clustered solid curves correspond to 
the nominal algorithm. The dashed curve indicates the slower convergence of simply 
applying snbin. The dotted curve shows the problem with not eventually switching 
to snbin-like behavior to address reducibility. The plateau in the solid curves ends 
at iteration 32, when the switch is made. The ten solid curves are closely clustered, 
indicating the variance in the algorithm's output is small for any number of requested 
mvp. 

In the performance experiments, for each matrix, the algorithm is run five times 
for K = 32, 64, and 128 iterations. Results are shown in Figures 14.21 and 14.31 for 
nonsymmetric matrices and 14.41 and 14.51 for symmetric matrices. Figure 14.21 shows 
that the ratio tends to decrease with K, as one expects. The ratio for the scaled 
problem, given fixed K, grows slowly with problem size TV. Figure 14.61 investigates 
this aspect more closely. It shows details for the case of K = 128 iterations for 
nonsymmetric (left) and symmetric (right) matrices. Over a range of matrix sizes of 
more than six orders of magnitude, the final ratio often ranges from between 1.5 and 
6. Figures |4~31 and |4~51 show that the condition number of the scaled matrix is almost 
always, and often substantially, smaller than that of the original matrix: any point 
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that falls below the diagonal line corresponds to a reduction in condition number. The 
top- left plots of Figures POl and B~o1 show the condition numbers of the exactly scaled 
matrices; the ratios are 1, of course. In the plots corresponding to the stochastic 
algorithms, what appears to be a point is in fact a cluster of the five points resulting 
from the five separate runs. The tightness of these clusters again implies that the 
variance of the outputs of these algorithms is quite small. 

These experiments suggest that ssbin and snbin arc effective matrix-free approx- 
imate equilibration algorithms: a small number — relative to the size of the matrix — of 
matrix-vector products is sufficient to approximately equilibrate the matrix. One ap- 
plication is to scale a matrix whose elements require too much work to access directly 
prior to using a Krylov-subspace iteration to solve a linear system. We recommend 
performing approximately 100 iterations, which corresponds to 100 matrix-vector 
products in the symmetric case and 200 in the nonsymmetric. 
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